Optimizing the NPB CG benchmark for multi-core AMD Opteron microprocessors

نویسنده

  • Stephen Whalen
چکیده

CG approximates the largest eigenvalue of a sparse, symmetric, positive definite matrix, using inverse iteration [3]. The matrix is generated by summing outer products of sparse vectors, with a fixed number of nonzero elements in each generating vector. The matrix sizes and total number of nonzero elements (“computed nonzeros,” following [3]) are listed in Table 1. The benchmark computes a given number of eigenvalue estimates, referred to as “outer iterations,” using 25 iterations of the conjugate gradient method to solve the linear system in each outer iteration.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Observation and analysis of the multicore performance impact on scientific applications

With the proliferation of large multicore high-performance computing systems, application performance is often negatively affected. This paper provides benchmark results for a representative workload from the Department of Defense High-performance Computing Modernization Program. The tests were run on a Cray XT-3 and XT-4, which use dualand quad-core AMD Opteron microprocessors. We use a combin...

متن کامل

Benchmarking CMSSW on Intel and AMD single-core, dual- core and quad-core systems

We have benchmarked dual-processor quad-core AMD Opteron 2350 and 2356, dual-processor quad-core Intel Xeon E5345, single processor quad-core Intel Xeon X5472, dual-processor dual-core AMD Opteron 2214, dual-processor single-core Intel Xeon EM64T and single-processor single-core Intel Xeon EM64T systems using a CMSSW event simulation and reconstruction application. The results are presented in ...

متن کامل

Understanding and Mitigating Multicore Performance Issues on the AMD Opteron Architecture

Over the past 15 years, microprocessor performance has doubled approximately every 18 months through increased clock rates and processing efficiency. In the past few years, clock frequency growth has stalled, and microprocessor manufacturers such as AMD have moved towards doubling the number of cores every 18 months in order to maintain historical growth rates in chip performance. This document...

متن کامل

A Scalability Study of Columbia using the NAS Parallel Benchmarks

The Columbia system at the NASA Advanced Supercomputing (NAS) facility is a cluster of 20 SGI Altix nodes, each with 512 Itanium 2 processors and 1 terabyte (TB) of shared-access memory. Four of the nodes are organized as a 2048-processor capability-computing platform connected by two low-latency interconnects— NUMALink4 (NL4) and InfiniBand (IB). To evaluate the scalability of Columbia with re...

متن کامل

A Comparative Performance Evaluation of Multi Processor Multi Core Server Processor Architectures on Enterprise Middleware Performance

In this paper we describe the performance evaluation and comparison of server based “Enterprise Middleware” frameworks on multi-processor multi-core server processor architectures. We experimented a 'single processor quad core Intel Xeon' server processor and a 'dual processor dual core multiprocessor AMD Opteron'. Also we discuss the expected enterprise middleware framework execution performan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007